Towards Order-Preserving SubMatrix Search and Indexing
نویسندگان
چکیده
Order-Preserving SubMatrix (OPSM) has been proved to be important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset of conditions. Given an OPSM query based on row or column keywords, it is desirable to retrieve OPSMs quickly from a large gene expression dataset or OPSM data via indices. However, the time of OPSM mining from gene expression dataset is long and the volume of OPSM data is huge. In this paper, we investigate the issues of indexing two datasets above and first present a naive solution pfTree by applying prefix-Tree. Due to it is not efficient to search the tree, we give an optimization indexing method pIndex. Different from pfTree, pIndex employs row and column header tables to traverse related branches in a bottom-up manner. Further, two pruning rules based on number and order of keywords are introduced. To reduce the number of column keyword candidates on fuzzy queries, we introduce a First Item of keywords roTation method FIT, which reduces it from n! to n. We conduct extensive experiments with real datasets on a single machine, Hadoop and Hama, and the experimental results show the efficiency and scalability of the proposed techniques.
منابع مشابه
OMEGA: An Order-Preserving SubMatrix Mining, Indexing and Search Tool
Order-Preserving SubMatrix (OPSM) has been accepted as a significant tool in modelling biologically meaningful subspace cluster, to discover the general tendency of gene expressions across a subset of conditions. Existing OPSM processing tools focus on giving a or some batch mining techniques, and are time-consuming and do not consider to support OPSM queries. To address the problems, the paper...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملAlgorithmic and Complexity Issues of Three Clustering Methods in Microarray Data Analysis1
The complexity, approximation and algorithmic issues of several clustering problems are studied. These non-traditional clustering problems arise from recent studies in microarray data analysis. We prove the following results. (1) Two variants of the Order-Preserving Submatrix problem are NP-hard. There are polynomial-time algorithms for the Order-Preserving Submatrix problem when the condition ...
متن کاملSolving the Order-Preserving Submatrix Problem via Integer Programming
In this paper we consider the Order Preserving Submatrix (OPSM) problem. This problem is known to be NP -hard. Although in recent years some heuristic methods have been presented to find OPSMs, they lack the guarantee of optimality. We present exact solution approaches based on linear mixed 0–1 programming formulations, and develop algorithmic enhancements to aid in solvability. Encouraging com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015